Towards Good Practices for Deep 3D Hand Pose Estimation
نویسندگان
چکیده
3D hand pose estimation from single depth image is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional random forest based methods is not so apparent. To exploit the good practice and promote the performance for hand pose estimation, we propose a tree-structured Region Ensemble Network (REN) for directly 3D coordinate regression. It first partitions the last convolution outputs of ConvNet into several grid regions. The results from separate fully-connected (FC) regressors on each regions are then integrated by another FC layer to perform the estimation. By exploitation of several training strategies including data augmentation and smooth L1 loss, proposed REN can significantly improve the performance of ConvNet to localize hand joints. The experimental results demonstrate that our approach achieves the best performance among state-of-the-art algorithms on three public hand pose datasets. We also experiment our methods on fingertip detection and human pose datasets and obtain stateof-the-art accuracy.
منابع مشابه
Pose Estimation Errors, the Ultimate Diagnosis
This paper proposes a thorough diagnosis for the problem of object detection and pose estimation. We provide a diagnostic tool to examine the impact in the performance of the different types of false positives, and the effects of the main object characteristics. We focus our study on the PASCAL 3D+ dataset, developing a complete diagnosis of four different state-of-the-art approaches, which spa...
متن کاملUsing a single RGB frame for real time 3D hand pose estimation in the wild
We present a method for the real-time estimation of the full 3D pose of one or more human hands using a single commodity RGB camera. Recent work in the area has displayed impressive progress using RGBD input. However, since the introduction of RGBD sensors, there has been little progress for the case of monocular color input. We capitalize on the latest advancements of deep learning, combining ...
متن کاملV2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D dept...
متن کاملEvaluation of Deep Learning based Pose Estimation for Sign Language
Human body pose estimation and hand detection being the prerequisites for sign language recognition(SLR), are both crucial and challenging tasks in Computer Vision and Machine Learning. There are many algorithms to accomplish these tasks for which the performance measures need to be evaluated for body posture recognition on a sign language dataset, that would serve as a baseline to provide impo...
متن کاملLarge-scale Multiview 3D Hand Pose Dataset
Accurate hand pose estimation at joint level has several uses on human-robot interaction, user interfacing and virtual reality applications. Yet, it currently is not a solved problem. The novel deep learning techniques could make a great improvement on this matter but they need a huge amount of annotated data. The hand pose datasets released so far present some issues that make them impossible ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.07248 شماره
صفحات -
تاریخ انتشار 2017